702 research outputs found
Ratings are overrated!
Are ratings of any use in human–computer interaction and user studies at large? If ratings are of limited use, is there a better alternative for quantitative subjective assessment? Beyond the intrinsic shortcomings of human reporting, there are a number of supplementary limitations and fundamental methodological flaws associated with rating-based questionnaires – i.e., questionnaires that ask participants to rate their level of agreement with a given statement, such as a Likert item. While the effect of these pitfalls has been largely downplayed, recent findings from diverse areas of study question the reliability of using ratings. Rank-based questionnaires – i.e., questionnaires that ask participants to rank two or more options – appear as the evident alternative that not only eliminates the core limitations of ratings but also simplifies the use of sound methodologies that yield more reliable models of the underlying reported construct: user emotion, preference, or opinion. This paper solicits recent findings from various disciplines interlinked with psychometrics and offers a quick guide for the use, processing, and analysis of rank-based questionnaires for the unique advantages they offer. The paper challenges the traditional state-of-practice in human–computer interaction and psychometrics directly contributing toward a paradigm shift in subjective reporting.peer-reviewe
Learning deep physiological models of affect
Feature extraction and feature selection are crucial
phases in the process of affective modeling. Both, however,
incorporate substantial limitations that hinder the development
of reliable and accurate models of affect. For the purpose of
modeling affect manifested through physiology, this paper builds
on recent advances in machine learning with deep learning
(DL) approaches. The efficiency of DL algorithms that train
artificial neural network models is tested and compared against
standard feature extraction and selection approaches followed
in the literature. Results on a game data corpus — containing
players’ physiological signals (i.e. skin conductance and blood
volume pulse) and subjective self-reports of affect — reveal that
DL outperforms manual ad-hoc feature extraction as it yields
significantly more accurate affective models. Moreover, it appears
that DL meets and even outperforms affective models that are
boosted by automatic feature selection, for several of the scenarios
examined. As the DL method is generic and applicable to any
affective modeling task, the key findings of the paper suggest
that ad-hoc feature extraction and selection — to a lesser degree
— could be bypassed.The authors would like to thank Tobias Mahlmann for his
work on the development and administration of the cluster
used to run the experiments. Special thanks for proofreading
goes to Yana Knight. Thanks also go to the Theano development
team, to all participants in our experiments, and to
Ubisoft, NSERC and Canada Research Chairs for funding.
This work is funded, in part, by the ILearnRW (project no:
318803) and the C2Learn (project no. 318480) FP7 ICT EU
projects.peer-reviewe
Don't classify ratings of affect ; rank them!
How should affect be appropriately annotated and how should machine learning best be employed to map
manifestations of affect to affect annotations? What is the use of ratings of affect for the study of affective computing and
how should we treat them? These are the key questions this paper attempts to address by investigating the impact of dissimilar
representations of annotated affect on the efficacy of affect modelling. In particular, we compare several different binary-class
and pairwise preference representations for automatically learning from ratings of affect. The representations are compared and
tested on three datasets: one synthetic dataset (testing “in vitro”) and two affective datasets (testing “in vivo”). The synthetic
dataset couples a number of attributes with generated rating values. The two affective datasets contain physiological and
contextual user attributes, and speech attributes, respectively; these attributes are coupled with ratings of various affective
and cognitive states. The main results of the paper suggest that ratings (when used) should be naturally transformed to ordinal
(ranked) representations for obtaining more reliable and generalisable models of affect. The findings of this paper have a direct
impact on affect annotation and modelling research but, most importantly, challenge the traditional state-of-practice in affective
computing and psychometrics at large.peer-reviewe
Analysing the relevance of experience partitions to the prediction of players’ self-reports of affect
A common practice in modeling affect from physiological signals consists of reducing the signals to a set of statistical features that feed predictors of self-reported emotions. This paper analyses the impact of various time-windows, used for the extraction of physiological features, to the accuracy of affective models of players in a simple 3D game. Results show that the signals recorded in the central part of a short gaming experience contain more relevant information to the prediction of positive affective states than the starting and ending parts while the relevant information to predict anxiety and frustration appear not to be localized in a specific time interval but rather dependent on particular game stimuli.peer-reviewe
Deep multimodal fusion : combining discrete events and continuous signals
Multimodal datasets often feature a combination of continuous signals and a series of discrete events. For instance, when
studying human behaviour it is common to annotate actions
performed by the participant over several other modalities
such as video recordings of the face or physiological signals.
These events are nominal, not frequent and are not sampled
at a continuous rate while signals are numeric and often
sampled at short fixed intervals. This fundamentally different nature complicates the analysis of the relation among
these modalities which is often studied after each modality
has been summarised or reduced.
This paper investigates a novel approach to model the
relation between such modality types bypassing the need
for summarising each modality independently of each other.
For that purpose, we introduce a deep learning model based
on convolutional neural networks that is adapted to process
multiple modalities at different time resolutions we name
deep multimodal fusion. Furthermore, we introduce and
compare three alternative methods (convolution, training
and pooling fusion) to integrate sequences of events with
continuous signals within this model. We evaluate deep multimodal fusion using a game user dataset where player physiological signals are recorded in parallel with game events.
Results suggest that the proposed architecture can appropriately capture multimodal information as it yields higher
prediction accuracies compared to single-modality models.
In addition, it appears that pooling fusion, based on a novel
filter-pooling method provides the more effective fusion approach for the investigated types of data.peer-reviewe
Mining multimodal sequential patterns : a case study on affect detection
Temporal data from multimodal interaction such as speech and bio-signals cannot be easily analysed without a preprocessing phase through which some key characteristics of the signals are extracted. Typically, standard statistical signal features such as average values are calculated prior to the analysis and, subsequently, are presented either to a multimodal fusion mechanism or a computational model of the interaction. This paper proposes a feature extraction methodology which is based on frequent sequence mining within and across multiple modalities of user input. The proposed method is applied for the fusion of physiological signals and gameplay information in a game survey dataset. The obtained sequences are analysed and used as predictors of user affect resulting in computational models of equal or higher accuracy compared to the models built on standard statistical features.peer-reviewe
Multimodal ptsd characterization via the startlemart game
Computer games have recently shown promise
as a diagnostic and treatment tool for psychiatric rehabilitation. This paper examines the potential of combining multiple modalities for detecting affective responses
of patients interacting with a simulation built on game
technology, aimed at the treatment of mental diagnoses
such as Post Traumatic Stress Disorder (PTSD). For
that purpose, we couple game design and game technology to create a game-based tool for exposure therapy and stress inoculation training that utilizes stress
detection for the automatic profiling and potential personalization of PTSD treatments. The PTSD treatment
game we designed forces the player to go through various stressful experiences while a stress detection mechanism profiles the severity and type of PTSD by analyzing the physiological responses to those in-game
stress elicitors in two separate modalities: skin conductance (SC) and blood volume pulse (BVP). SC is often
used to monitor stress as it is connected to the activation of the sympathetic nervous system (SNS). By including BVP into the model we introduce information
about para-sympathetic activation, which offers a more
complete view of the psycho-physiological experience
of the player; in addition, as BVP is also modulated
by SNS, a multimodal model should be more robust
to changes in each modality due to particular drugs or
day-to-day bodily changes. Overall, the study and analysis of 14 PTSD-diagnosed veteran soldiers presented in
this paper reveals correspondence between diagnostic
standard measures of PTSD severity and SC and BVP
responsiveness and feature combinations thereof. The
study also reveals that these features are significantly
correlated with subjective evaluations of the stressfulness of experiences, represented as pairwise preferences.
More importantly, the results presented here demonstrate that using the modalities of skin conductance and
blood volume pulse captures a more nuanced representation of player stress responses than using skin conductance alone. We conclude that the results support
the use of the simulation as a relevant treatment tool
for stress inoculation training, and suggest the feasibility of using such a tool to profile PTSD patients. The
use of multiple modalities appears to be key for an accurate profiling, although further research and analysis
are required to identify the most relevant physiological
features for capturing user stress.peer-reviewe
Validating generic metrics of fairness in game-based resource allocation scenarios with crowdsourced annotations
Being able to effectively measure the notion of fairness is of vital importance as it can provide insight into the formation and evolution of complex patterns and phenomena, such as social preferences, collaboration, group structures and social conflicts. This paper presents a comparative study for quantitatively modelling the notion of fairness in one-to-many resource allocation scenarios - i.e. one provider agent has to allocate resources to multiple receiver agents. For this purpose, we investigate the efficacy of six metrics and cross-validate them on crowdsourced human ranks of fairness annotated through a computer game implementation of the one-to-many resource allocation scenario. Four of the fairness metrics examined are well-established metrics of data dispersion, namely standard deviation, normalised entropy, the Gini coefficient and the fairness index. The fifth metric, proposed by the authors, is an ad-hoc context-based measure which is based on key aspects of distribution strategies. The sixth metric, finally, is machine learned via ranking support vector machines (SVMs) on the crowdsourced human perceptions of fairness. Results suggest that all ad-hoc designed metrics correlate well with the human notion of fairness, and the context-based metrics we propose appear to have a predictability advantage over the other ad-hoc metrics. On the other hand, the normalised entropy and fairness index metrics appear to be the most expressive and generic for measuring fairness for the scenario adopted in this study and beyond. The SVM model can automatically model fairness more accurately than any ad-hoc metric examined (with an accuracy of 81.86%) but it is limited by its expressivity and generalisability.Being able to effectively measure the notion of fairness is of vital importance as it can provide insight into the formation and evolution of complex patterns and phenomena, such as social preferences, collaboration, group structures and social conflicts. This paper presents a comparative study for quantitatively modelling the notion of fairness in one-to-many resource allocation scenarios - i.e. one provider agent has to allocate resources to multiple receiver agents. For this purpose, we investigate the efficacy of six metrics and cross-validate them on crowdsourced human ranks of fairness annotated through a computer game implementation of the one-to-many resource allocation scenario. Four of the fairness metrics examined are well-established metrics of data dispersion, namely standard deviation, normalised entropy, the Gini coefficient and the fairness index. The fifth metric, proposed by the authors, is an ad-hoc context-based measure which is based on key aspects of distribution strategies. The sixth metric, finally, is machine learned via ranking support vector machines (SVMs) on the crowdsourced human perceptions of fairness. Results suggest that all ad-hoc designed metrics correlate well with the human notion of fairness, and the context-based metrics we propose appear to have a predictability advantage over the other ad-hoc metrics. On the other hand, the normalised entropy and fairness index metrics appear to be the most expressive and generic for measuring fairness for the scenario adopted in this study and beyond. The SVM model can automatically model fairness more accurately than any ad-hoc metric examined (with an accuracy of 81.86%) but it is limited by its expressivity and generalisability.peer-reviewe
Genetic search feature selection for affective modeling : a case study on reported preferences
Automatic feature selection is a critical step towards the generation of successful computational models of affect. This paper presents a genetic search-based feature selection method which is developed as a global-search algorithm for improving the accuracy of the affective models built. The method is tested and compared against sequential forward feature selection and random search in a dataset derived from a game survey experiment which contains bimodal input features (physiological and gameplay) and expressed pairwise preferences of affect. Results suggest that the proposed method is capable of picking subsets of features that generate more accurate affective models.peer-reviewe
Generic physiological features as predictors of player experience
This paper examines the generality of features extracted from heart rate (HR) and skin conductance (SC) signals as predictors of self-reported player affect expressed as pairwise preferences. Artificial neural networks are trained to accurately map physiological features to expressed affect in two dissimilar and independent game surveys. The performance of the obtained affective models which are trained on one game is tested on the unseen physiological and self-reported data of the other game. Results in this early study suggest that there exist features of HR and SC such as average HR and one and two-step SC variation that are able to predict affective states across games of different genre and dissimilar game mechanics.peer-reviewe
- …